“I tried to do some random plots for the project and at first, I didn’t precisely understand the data I was working with. In my metadata I had information about Shannon entropy and Shannon entropy on function, and I did not understand it’s two separate things (one connected to taxonomy and the other one to functions gut microbial has)
What could be improved:
better understanding of data and choosing different thing to put in the plot
Naming both of the axes
Choosing different colours – instead of green there should be colour which is more outstanding than blue (maybe red or orange)
Maybe I should also make a lower transparency so the overlapping data will be more visible
Changing the name of the plot because the current one says nothing”
Improved plots
As I mentioned in my previous assignment, I did not understand the data I was working on, so the plots I will present will be ones I made after a deeper understanding of the given data.
(I have data connected to Gut Microbiom)
library(tidyr)
Warning: pakiet 'tidyr' został zbudowany w wersji R 4.4.3
library(dplyr)
Warning: pakiet 'dplyr' został zbudowany w wersji R 4.4.3
Dołączanie pakietu: 'dplyr'
Następujące obiekty zostały zakryte z 'package:stats':
filter, lag
Następujące obiekty zostały zakryte z 'package:base':
intersect, setdiff, setequal, union
library(readr)
Warning: pakiet 'readr' został zbudowany w wersji R 4.4.3
library(ggplot2)
Warning: pakiet 'ggplot2' został zbudowany w wersji R 4.4.3
library(plotly)
Warning: pakiet 'plotly' został zbudowany w wersji R 4.4.3
Dołączanie pakietu: 'plotly'
Następujący obiekt został zakryty z 'package:ggplot2':
last_plot
Następujący obiekt został zakryty z 'package:stats':
filter
Następujący obiekt został zakryty z 'package:graphics':
layout
1) Loading data
meta =read.csv("metadata.csv", sep ="\t")View(meta)
taxa =read.csv("taxa.csv", sep ="\t")View(taxa)
2) Creating histograms
shan_entr <-ggplot(data = meta, aes(x = Shannon.Entropy))shan_entr +geom_histogram( color ="black", fill ="blue")+labs(title ="Histogram Shannon Entropy",x ="Shannon Entropy",y ="Frequency")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
shan_entr_func <-ggplot(data = meta, aes(x = Shannon.Entropy.on.Functions))shan_entr_func +geom_histogram(color ="black", fill ="blue")+labs(title ="Histogram Shannon Entropy on Functions",x ="Shannon Entropy on Functions",y ="Frequency")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The plots show the frequencies of Shannon entropies and the GMHI index. Each plot has a title and labeled axes, and the data distribution is clearly visible. I particularly liked the GMHI histogram because its two peaks may indicate the existence of two subgroups (e.g., healthy and diseased individuals).
Then I joined it with metadata so I know which train has which disease.
WARNING: When joining tables, I sometimes get columns named either category or category.x (reason unknown). If a code block isn’t working, try switching to the alternative one.
I selected one of the diseases to focus on in later analyses. IGT is one of the most common categories in my database, which is why we chose to work with it. I used it as an example because, despite having fewer samples than the healthy group, the relationship is still clearly visible. Unfortunately, processing larger datasets caused my RStudio to crash.
This plot shows the density of bacteria in IGT samples. The distinct peak for certain bacterial strains suggests a dominant microbial signature associated with IGT. Using Plotly enhanced interactive visualization, allowing us to identify the most densely represented bacteria. I omitted the legend since the focus was on overall trends rather than individual strain identities. Later in the project, I quantified the most prevalent bacteria, with results presented in tabular format to highlight key differences between healthy and diseased populations.
plot_ly(data = taxa_IGT, y =~Values, x=~sample.id, type ='scatter', color =~Trains, alpha =0.7) %>%layout(title ="Density of Bacteria in IGT",xaxis =list(title ="Bacteria"),yaxis =list(title ="Values"),showlegend =FALSE )
No scatter mode specifed:
Setting the mode to markers
Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
Warning: Ignoring 240121 observations
Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors
Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
Returning the palette you asked for with that many colors